An Empirical Study on DOACROSS Loops
نویسندگان
چکیده
Loop-iteration level parallelism is one of the most common forms of parallelism being exploited by optimizing compilers and parallel machines. In this study, we selected 6 large application programs and used an execution-driven simulation technique from MaxPar 5] to identify and to measure the eeectiveness of concurrent DOACROSS loops execution. It was found that executing DOACROSS loops serially can signiicantly degrade the performance for some of the programs. We also measured and studied the characteristics of those cross-iteration dependences in DOACROSS loops and measured the capability of a state-of-the-art parallelizing compiler, KAP, in identifying and eliminating cross-iteration dependences.
منابع مشابه
Model-Driven Tile Size Selection for DOACROSS Loops on GPUs
DOALL loops are tiled to exploit DOALL parallelism and data locality on GPUs. In contrast, due to loop-carried dependences, DOACROSS loops must be skewed first in order to make tiling legal and exploit wavefront parallelism across the tiles and within a tile. Thus, tile size selection, which is performance-critical, becomes more complex for DOACROSS loops than DOALL loops on GPUs. This paper pr...
متن کاملRedundant Synchronization Elimination for DOACROSS Loops
Synchronizations are necessary when there are dependences between concurrent processes. However, many synchronizations are redundant because the composite effect of the other synchronizations may have already covered them. The problem of redundant synchronization elimination in DOACROSS loops is investigated and an efficient algorithm that identifies redundant synchronizations in multiply-neste...
متن کاملA Practical Approach to DOACROSS Parallelization
Loops with cross-iteration dependences (DOACROSS loops) often contain significant amounts of parallelism that can potentially be exploited on modern manycore processors. However, most production-strength compilers focus their automatic parallelization efforts on DOALL loops, and consider DOACROSS parallelism to be impractical due to the space inefficiencies and the synchronization overheads of ...
متن کاملEXPLORER: Supporting Run-Time Parallelization of DO-ACROSS Loops on General Networks of Workstations
Performing runtime parallelization on general networks of workstations (NOWs) without special hardware or system software supports is very diicult, especially for DOACROSS loops. With the high communication overhead on NOWs, there is hardly any performance gain for runtime parallelization, due to the latter's large amount of messages for dependence detection, data accesses, and computation sche...
متن کاملOn Effective Execution of Nonuniform DOACROSS Loops
It is extremely difficult to parallelize DOACROSS loops with non-uniform loop-carried dependences. In this paper, we present a static scheduling scheme with an accompanying synchronization strategy that can execute such DOACROSS loops effectively and efficiently. Our approach uses one of the parallelization techniques called Dependence Uniformization, which finds a small set of uniform dependen...
متن کامل